Module 6 - Processing Strings and Text
Learning Objectives
- Learn to use regular expressions (regex) to find text
- Process and transform strings in R
- Working with factors for Categorical data
Readings
- RDS (R for Data Science): Chapters 14, 15, 16
Additional Resources:
Regular expressions are useful in a large number of applications. The first website provides an advanced tutorial, the second one provides fun exercises. If you are unsatisfied and want to learn even more, regular expressions are equivalent to a computer science concept known as “finite state automata”. You can dive way deeper by reading these lecture notes:
stringr package: reference index. This is the tidyverse package for processing strings. If you are stuck on a string problem the solution is probably somewhere here
forcats package reference index . This is the reference for a very useful package for handling categorical data
Wrangling Categorical Data in R. A. McNamara and N. Horton. The American Statistician (2018) Vol 72.
This paper discusses both tidyverse and base R approaches to categorical data.